Skip to main content

Creating a Concept Parser Project – Training & Project Data

Now that the 4-step initial set-up is done, let’s examine the next steps.

Step 5: Add Reference Data to Sub-Concepts

Continuing from Step 4, once the records containing a list of Sub-Concepts have been loaded, the user needs to add Reference Data - basically Data Set Columns - for these Sub-Concepts. This list may take a little time to appear as it runs as a small job. Please do not close the screen while the job runs.

You can now optionally add Reference Data for these Sub-Concepts or ‘Skip Step & Continue’ to simply add the training Data if you don’t have additional Reference Data for the Sub-Concepts.

Adding Reference Data is simply a process of adding Filtered Data Columns corresponding to the selected Sub-Concepts from the above screen by selecting using checkboxes and clicking on the ‘Add Reference Data Source’ button. This opens up a modal to add Data Columns that populate in the Reference Data Source column on the same screen.

Step 6: Select Training Data Sets

In this step, the user needs to provide one or more Training Data Sets which contain the Classifier Sub-Concept and corresponding Sub-Concept values. This will act as the training data because it already contains the parsed key value pairs for each Sub-Concept.

Select Training Data Columns

Once the user has selected the Training Data Sets, these will now appear in the next screen with the option for the user to choose:

  1. The Column which contains the Sub-Concepts

  2. The Column which contains the Sub-Concept values

This means if there are 2 Data Sets as we see above, we will have 2 rows to select the above 2 items from, as shown in the image below.

Step 7: Add Data to Project

As we do in SOC projects as well, the last step is to add the actual Project Data on which the model will run. This Project Data needs to be exclusive of Training Data as per the rules of ML.

Therefore, this screen will show the Data Sets to which the user has, at least, read rights, and which contains the Concept to parse (as High Confidence tagged concept), but with the already used Training Data shown disabled in the list.

The user should move those Data Sets which have high confidence mapping to the: Concept to be Parsed. Once at least one Data Set is moved to the right panel, the user can click the Run Model button to trigger the Concept Parser project which runs a Job that can be viewed from the Jobs Link in the left navigation.

This finishes the Concept Parser creation Process, which will generate a project summary, Task, etc. in the Project Home screen once the Job is completed.

System Validations

  1. The user cannot create a Classifier with the same name as an existing concept lying within that Semantic Object in which the Classifier is being created.

  2. Prerequisites mentioned at the beginning of the section need to be completed.

  3. The system will disallow the same Data set(s) that are used in training to be used in Project Data

  4. In the Project Data screen, only Data Sets containing the Concept to be parsed and Classifier can be moved to the right panel otherwise the system will prompt a warning.